Omics-based integrated analysis identified IKZF2 as a biomarker associated with lupus nephritis

Lupus nephritis (LN) is a crucial complication of systemic lupus erythematosus (SLE). IKZF2 was identified as a lupus susceptibility locus, while its exact molecular function in LN is unknown. We aimed to explore the relationship between IKZF2 and LN based on multi-omics data. In our study, we carried out a meta-analysis of publicly available data, including not only tubulointerstitium, but also glomerulus tissue samples from LN patients and controls. Based on the common differentially expressed genes (co-DEGs) and previous researches, we selected IKZF2 for further analysis. Then, we analyzed potential molecular mechanisms of co-DEGs and IKZF2 in LN. To explore the possible targets of IKZF2, protein–protein interaction network (PPI) network and ceRNA network of IKZF2 were also constructed. Moreover, we performed immune infiltration analysis and evaluated clinical value of IKZF2. A total of 26 co-DEGs were observed in the integration of the above DEGs coming from the four sets of data, of which IKZF2 was selected for further analysis. Functional enrichment analysis from IKZF2 and related PPI network confirmed the tight relationship between IKZF2 and the immune reaction. Moreover, immune filtration analysis revealed the significant correlation between IKZF2 and naïve B cell, NK cell activation, NK cell rest and other immune cells. Receiver operating characteristic (ROC) analysis showed that the areas under the ROC curves were 0.721, 0.80, 0.682, and 0.859 for IKZF2 in four datasets, which demonstrated the clinical value of IKZF2. Our study revealed that IKZF2 may play an essential role in the molecular function and development of LN, and might be a potential biomarker for distinguishing LN patients and healthy ones.

Systemic lupus erythematosus (SLE) is a multifactorial and multisystem autoimmune disease that commonly affects the kidneys 1 . LN is a major driver of mortality of SLE patients 2 , and up to 20% of SLE patients who have been afflicted by LN will finally develop end-stage kidney disease (ESKD) within the first decade of the disease course 3 . However, on the one hand, the clinical detection of early disease can be challenging because patients often lack overt signs of kidney disease 4 , and the conventional parameters such as proteinuria and serum creatinine are insensitive and non-specific 5 . On the other hand, current therapies are not sufficiently efficacious to control LN and not all patients show adequate treatment responses 6 . Thus, it is critical and necessary to search and identify novel candidate biomarkers or therapeutic targets for LN.
Big data genomic assays have advanced our understanding of the molecular heterogeneity of SLE 7 . Genomewide association studies (GWASs) in SLE have revealed more than 80 susceptibility loci [8][9][10] . Nevertheless, the identified risk variants could only explain a small portion of heritability of SLE 11 . Besides, gene expression, as an intermediate phenotype between DNA and disease phenotypic variation, can provide us with a new perspective to our understanding of the molecular mechanisms leading to SLE 12 . IKZF2, also known as HELIOS, was initially thought to be restricted to Treg cells 13,14 . In Treg cells, it was claimed to control IL2 expression by promote binding of Foxp3 to the IL2 promoter 15 . It is expressed in other immune cells, epithelial tissues of the gut, the respiratory tract, and the tubules of the kidney 13 etc. Interestingly, it was also identified as a lupus susceptibility locus by GWASs 9,16 , and a recent research found it was enriched to regulating the related gene for T cells, B cells, and peripheral blood cells (PBCs) in SLE 17 . However, its exact molecular function in LN is still unknown 18 .
In our study, we carried out a meta-analysis of publicly available data including not only tubulointerstitium but also glomerulus tissue samples from LN patients and pre-transplant healthy living donors (LD), and we identified common differentially expressed genes (co-DEGs) for each dataset. Among 26 co-DEGs, we selected IKZF2 for further analysis based on the GWASs 8-10 and previous researches 9, [13][14][15][16][17][18] . Then, we analyzed potential molecular mechanisms of co-DEGs and IKZF2 in LN. Protein-protein interaction network (PPI) network and ceRNA network of IKZF2 were also constructed to explore the possible targets of IKZF2. Moreover, we performed immune infiltration analysis and evaluated clinical value of IKZF2. Over all, this is the first study applying multiple datasets to understand the molecular mechanism of IKZF2 in LN in detail.

Materials and methods
Data collection. We mined the Gene Expression Omnibus (GEO) database 19 to find publicly available gene expression datasets for LN including both tubulointerstitium or glomerulus tissue samples. The datasets of GSE32591 20 and GSE113342 21 were selected and downloaded by the R package GEOquery 22 . Each dataset was divided into two groups of tubulointerstitium (TUB) or glomerulus (GLOM) for subsequent analysis. For details of the LN cases and LD included in the two gene expression profiles, see Table 1.

Common differentially expressed genes (co-DEGs). Differentially expressed genes (DEGs) between
LN cases and LD were identified by R package Limma 23 in the two gene expression profiles for the tubulointerstitium and glomerulus tissue type, respectively. The criteria of DEGs was defined with adjust P-value < 0.05, and up/down-regulated DEGs was defined with adjust p-value < 0.05 and |logFC|> 1. Then, the co-DEGs was the intersection of the four gene sets. Thus, The DEGs of each dataset were also integrated with R package "RobustRankAggreg" 24 , one meta-analysis method, in order to make the result robust. In all, the results of DEGs were visualized by the R package ggplot2 25 , pheatmap 26 , and VennDiagram 27 .
Functional enrichment analysis of co-DEGs. Gene Ontology (GO) analysis is commonly conducted for large-scale functional enrichment study 28 . Kyoto Encyclopedia of Genes and Genomes (KEGG, www. kegg. jp/ kegg/ kegg1. html) pathway is a database regarding diseases, genomes, biological pathways, chemical and drugs substances 29 . Disease Ontology (DO) 30 annotates genes based on human diseases, helping to transform molecular discovery from high-throughput data into clinical relevance. We used the R package cluster Profiler 31 to perform GO and KEGG analysis, and used the R package DOSE 32 to conduct DO annotation. P-value < 0.05 was considered to be statistically significant for enrichment analysis.
IKZF2-associated protein-protein interaction (PPI) network. STRING 33 is a public data source of known and predicted protein-protein interactions (PPI), from which we extracted the PPI of IKZF2-associated genes and used cytoscape 34 to perform visualization. Then, we applied cytoscape plugin Molecular Complex  35 to identified highly interconnected clusters in the IKZF2-associated PPI network, and annotated GO terms of the specific cluster by ClueGO 36 plug-in. Besides, the Database for Annotation, Visualization, and Integrate Discovery (DAVID) 37 and Gene Set Enrichment Analysis (GSEA) 38 were also used to understand biological significance underlying the IKZF2-associated PPI network.
Construction of ceRNA network. To further analyze the potential targets and biological function of IKZF2 in LN, we constructed a competing endogenous RNA (ceRNA) 39 network related to IKZF2 including the mRNA, miRNA and lncRNA. The prediction of IKZF2-related miRNA targets was obtained from the starBase (http:// starb ase. sysu. edu. cn/ index. php) 40 , miRDB (http:// www. mirdb. org/) 41 44 . Based on miRNA-mRNA and miRNA-LncRNA, a ceRNA network was established and visualized using Cytoscape. www.nature.com/scientificreports/ Immune infiltration analysis. For the co-DEGs between LN cases and LD of the tubulointerstitium or glomerulus group in the two datasets, immune infiltration analysis was performed using CIBERSORT 45 , which was based linear support vector regression to deconvolute relative abundance of 22 infiltrated immune cells. Principal components analysis (PCA) was applied to illustrate the differences in immune infiltrating cells between LN cases and LD, and the stromal/immune scores of each sample were shown by the R package pheatmap 26 . Besides, the correlation between the IKZF2 expression and the types of 22 immune cell was determined by Spearman's correlation test 46 .
Clinical characteristics correlation analysis and relative risk assessment. Spearman

Results
The identification of co-DEGs. To elucidate DEG between the sample from the LN cases and LD, the four samples from GSE32951 and GSE113342 datasets was analyzed using R package limma to identify the DEG and its related up/down-regulated genes according to the cutoff threshold (adjust P-value < 0.05, and |logFC|> 1 ( Table 2 and Fig. 1A-D). A total of 26 co-DEGs were observed in the integration of the four sets of data ( Fig. 2A). The color of heatmap represents each co-DEG expression in different sample ( Fig. 3A-D). These results revealed that the 26 co-DEGs can easily distinguished LN samples from LD.
Based on the GWASs 8-10 and previous researches 9,13-18 , we selected IKZF2 as the focus of the follow-up research. Compared with normal controls, the expression value of IKZF2 was significantly downregulated in the LN samples (Fig. 2B). Besides, IKZF2 was also shown in the first 10 downregulated DEGs of all datasets determined by "RobustRankAggreg" (Fig. 2C).
Functional enrichment analysis of co-DEGs. In order to get further insight into the correlation between co-DEGs and biological processes, molecular functions, cellular components, biological pathways and specific diseases, functional enrichment was conducted. Firstly, GO analysis was conducted to reveal the biologi- www.nature.com/scientificreports/ cal characteristics of 26 co-DEGs. In the biological process category, these co-DEGs were mainly enriched in the mononuclear cell proliferation, leukocyte proliferation, regulation of inflammatory, etc. And from the point of molecular function, the hub mostly enriched GO terms included cytokine receptor binding, immune receptor activity, tumor necrosis factor and death receptor. Moreover, the cellular component category exhibited that co-DEGs mainly concentrated on the secretory granule membrane, membrane raft, membrane microdomain and membrane region ( Fig. 4 A and Table 3). The KEGG analysis upon the 26 co-DEGs showed that Epstein-Barr virus infection, complement and coagulation cascades and osteoclast differentiation, were mostly enriched terms of co-DEG ( Fig. 4B and Table 4). The details of the significantly enriched pathways related to SLE were especially presented in Fig. 4E (https:// www. kegg. jp/ entry/ map05 322). The 26 co-DEGs were also mainly enriched in a number of signal pathways, which were highly related to cell proliferation and cell senescence including NF − kappa B signaling pathway, MAPK signaling pathway, Th17 cell differentiation and Toll − like receptor signaling pathway (Fig. 4B\D). To explore the potential effects of co-DEGs upon disease, DO analysis was performed. The results depicted that co-DEGs were mostly enriched in kidney disease, urinary system disease, lymphoblastic leukemia, membranous glomerulonephritis and Human immunodeficiency virus infectious disease. (Fig. 4C and Table 5).    5C and Table 6). Moreover, categorization by signal pathway revealed that Chronic myeloid leukemia, ErbB signal pathway, Glioma, Table 3. GO analysis of co-DEGs.   www.nature.com/scientificreports/  Table 7). In order to further verify the latent functional implication of IKZF2 correlated genes in the PPI network, GESA analysis was conducted, and four mostly enriched GO terms were observed, including Cell intima binding organelle, Multicellular biological development, Multicellular biological process and Animal organ development (Fig. 5E). In addition, the genes in PPI network were mainly enriched in PI3K-Akt signal pathway, The cancer related pathway, JAK-STAT signal pathway and Ras signaling pathway (Fig. 5F). Next, we applied cytoscape plugin Molecular Complex Detection (MCODE) to identify highly interplayed clusters in the IKZF2 associated PPI network and finally obtained two functional clusters. And then, we utilized clueGO to annotate functional cluster 1 for its higher score (Fig. 5D). The results showed that multiple genes in functional cluster 1 were mainly enriched in two or three kinds of biological functions and pathways.
Construction of the ceRNA regulatory network. We next set out to better comprehend the potential regulation mechanism of IKZF2 in the progression of LN by establishing IKZF2 related ceRNA regulatory network. First, we utilized four databases to predict miRNA, which targeted the gene expression of IKZF2 and eventually obtained 22 unique miRNAs in all databases (Fig. 6A). Subsequently, 1178 LncRNA interplaying with the above 22 miRNAs were forecast based on miRNet database. And finally, cytoscape software was utilized for visualizing ceRNA regulatory network, which was composed of lncRNA-miRNA-mRNA interaction information. The cytohub 48 plug-in in cytoscape was used to identify the key lncRNA-miRNA-mRNA regulatory mechanism in the ceRNA network ( Fig. 7A and Supplementary1_miRNA.xls).  www.nature.com/scientificreports/ ◂ Immune infiltration analysis. We further analyzed the relative abundance of immune infiltration of co-DEGs in 22 immune cells types from LN and normal tissues. The results exhibited that there was a significant distinction between LN samples and LD in the GLOM group from GSE32591 dataset (Fig. 6C). PCA analysis also confirmed this difference of immune cell infiltration between LN samples and LD (Fig. 6D). Subsequently, we conducted statistical analysis upon the 22 immune cells types to determine their difference in the two groups. Statistical analysis revealed that the significant difference of abundance in the 22 immune cells types were mainly focus on the following immune cells: neutrophils, monocytes, eosinophils and macrophages M0 (Fig. 6B). Moreover, Spearman correlation was calculated to reveal the link between 22 types of immune cells in LN tissues. The results indicated four immune cell subsets, including naive T cells, monocytes, neutrophils, and γδT cells, were significantly correlated (Fig. 6E). The results also revealed the significant correlation between IKZF2 and naïve B cell, NK cell activation, NK cell rest and other immune cells. (Fig. 6F).
In addition, the immune infiltration between LN/LD samples from other group of datasets, including TUB group in the dataset GSE32591 (Fig. 6G-K), GLOM group ( Fig. 7B-F) and TUB group (Fig. 7G-K) in the dataset GSE113342 was also analyzed in our study. Overall, in TUB group between LN/LD samples, the significant difference of abundance in the 22 immune cells types were mainly focus on the following immune cells (Cor > 0.5): γδT cells, T follicular helper, M1 Macrophage, Mast cell resting and Dendritic cell activated. In GLOM group between LN/LD samples, the results obviously showed the significant correlation between IKZF2 and kinds of T cell, NK cell activated, neutrophils, and other immune cells (Figs. 6, 7).
Clinical characteristics correlation analysis and relative risk assessment. In order to verify the clinical significance of IKZF2, we preformed correlation analysis between the gene expression level of IKZF2 and the age of LN patients and related LN class. Although, no significant correlation was found between the IKZF2 expression level and the age of relative patients (Fig. 8A\C), the results exhibited that the IKZF2 expression was related with the disease grade of LN.
In the GLOM group, higher LN class revealed higher expression of IKZF2. In the TUB group, on the contrary, higher disease grade of LN denoted lower expression of IKZF2. (Fig. 8B\D) After that, we next explored whether IKZF2 could distinguish LN samples and LD. The areas under the ROC curves were 0.721, 0.80, 0.682, and 0.859 for IKZF2 in GSE32591_GLOM, GSE32591_TUB,GSE113342_GLOM, and GSE113342_TUB datasets, which implicated that IKZF2 might be a potential biomarker for distinguishing LN patients and healthy ones. (Fig. 8E-H).
Validation with other GEO datasets. To verify our conclusion in a broader range, we analyzed the differentially expressed level of IKZF2 between LN and normal by datasets of GSE98422 49 and GSE104948 50 (Fig. 9A-C). The areas under the ROC curves were 0.921 for IKZF2 in GSE98422. And all datasets showed that IKZF2 was consistently downregulated in LN, illustrating a satisfactory reliability of our research.
Moreover, we valued IKZF2 expression between IgA Nephropathy and normal by datasets of GSE104948 50 and GSE115857 (Fig. 9D\F), and IKZF2 didn't exhibit significant transcriptional differences in the two datasets. We also examined the differentially expressed level of IKZF2 between ANCA associated vasculitis and normal by datasets of GSE104948 50 and GSE108113 50 (Fig. 9E\G\H). IKZF2 was downregulated in ANCA associated vasculitis of GSE104948, but no significant difference was found in GSE108113.

Discussion
SLE is a chronic autoimmune disease characterized by loss of immune tolerance and various organ or tissue damage. About 25-50% patients diagnosed with SLE have symptoms of LN at onset, and approximately 60% of adult patients with SLE develop renal signs or symptoms during the disease course. LN remains/ed a major cause of morbidity and death for SLE patients 51 . Laboratory test has made great strides in diagnosis and therapy, but the incidence of renal flares ranged from 27 to 66% 52 . However, SLE is a multifactorial process with potential involvement of different organs, and varied in its clinical severity, and the pathogenesis remains unclear. Hence, more accurate prediction and evaluation remain the great challenges for the improvement of LN outcomes.
The tissue injury or clinical manifestations are influenced by genetic, epigenetic, environmental, hormonal, and immunoregulatory factors 2 . Many genes which encode related proteins have been found potential candidates associated with SLE, such as DNase I, DNase γ, DNase III and TREX1 involved in DNA clearance, C1q and C4 od the complement pathway and other functions (ACP5, AGS5, FASL and STING) 51 and Fc receptors 53    www.nature.com/scientificreports/ including type I interferon production (IRF5, IRF7, TLR7, TLR8, TLR9 and STAT4) 54 , nuclear factor-kappa B (NF-κB) signaling pathway 55 and effector and regulatory CD4( +) T cells especially Th 17 cells 56 . People has focused on the possibility that viruses may trigger SLE for decades. It suggested that CD8 + T cells are unable to control Epstein-Barr virus (EBV)-infected B cells in SLE patients, because of the higher viral load and faster seroconversion to infection and the molecular similarity between EBV nuclear antigen 1 and the common lupus autoantigen Ro 2,57-59 .
In the current study, gene expression analysis was applied to identify co-DEGs and possible predictor factors between LN patients with healthy controls. A total of 26 co-DEGs, 717 miRNA and 10,125 lncRNA were identified and the PPI network was conducted. Most of the variable gene and pathways involved have been previously described in previous researches. Based on the analysis of co-DEGs expression, the IKZF2 was found to be down-regulated in the GLOM group and TUB group of the disease group. In order to explore the potential function of IKZF2 in LN patients, we analyzed the abundance of co-DEGs in 22 types of immune cells in LN tissues and normal tissues. The IKZF2 was correlated with immune cells including initial B cells and NK cell. The result showed the excellent value of IKZF2 for the AUC and we can substantially distinguish LN and healthy controls tissues.
The IKZF2 gene encodes a member of the Ikaros family of zinc-finger proteins including Ikaros, Aiolos and HELIOS, which are hematopoietic-specific transcription factors involved in the regulation of lymphocyte development 60 . IKZF2, also known as HELIOS, a transcription factor expressing in 60-70% Foxp3þ Treg cells, recently has been demonstrated that it played a role in controlling certain aspects of Treg-suppressive function, differentiation, and survival 61 . Treg cells play an important role in maintaining immune self-tolerance by suppressing effector T and B cells. And it's believed that abnormalities of Treg cells may lead to T and B cell hyperactivity and contribute to the pathogenesis of SLE. It's reported that Tregs expressing IKZF2 have stronger immunosuppressive capacity 62 . In vitro experiments, it was found that IKZF2 deficient mice would acquire an auto-inflammatory phenotype similar to rheumatoid arthritis. And activated CD4 + and CD8 + T-cells, T-follicular helper cells and germinal center B-cells were noticed increased leading to production of autoantibody in circulation 63 . In recent study, CD4 + CD25 − Foxp3 + T cells major expressing IKZF2 were indicated significantly increased in active SLE patients. Detailed cohort analysis revealed increased proportions of CD4 + CD25 − Foxp3 + T cells in SLE patients especially LN patients. And the same type T cell was detected in urine sediment samples of patients with active glomerulonephritis and correlated with the extent of proteinuria 64 . Related data suggested that association between the percentage of FoxP3 + IKZF2 + Treg cells and the SLEDAI score is positive. There is a higher percentage of FoxP3 + IKZF2 + Treg cells in patients with more active disease compared to the patients with inactive disease or healthy donors 62 . Thus, it's assumed that CD4 + CD25 − Foxp3 + T cells might serve as an important tool to recognize and monitor SLE patients with renal involvement. However, IKZF2 is considered as a marker of T cell activation and proliferation 62,65 . It's possible that IKZF2 might become a new index for prediction and evaluation of LN.
Our study is the first study to explore the molecular mechanisms of IKZF2 in LN by multiple datasets and applying the immune infiltration analysis. Moreover, the AUC of IKZF2 expression had very high diagnostic value in each dataset based on ROC curves, revealing it might be a potential biomarker for distinguishing LN patients and healthy ones. However, due to the ethical and traumatic reasons, SLE patients without nephritis may not be punctured by the kidney. We cannot compare patients with nephritis against patients without nephritis. Nevertheless, we tried our best to obtain all available data and also analyzed clinical characteristics correlation and relative risk assessment. In addition, we valued IKZF2 expression in IgA Nephropathy and ANCA associated vasculitis, verifying our conclusion in a broader range. Besides, further experimental verifications are necessary to illuminate the biological functions of IKZF2 in LN. Subsequently, we would like to perform more careful examinations of the function of IKZF2 in LN by combining in vivo and in vitro experiments. Therefore, we consider that our study may help to investigate the progress of LN, and that IKZF2 may become biomarkers and therapeutic targets of LN in the future.

Conclusion
The absence of our patient cohort makes it difficult to verify the results. However, previous researches focused on the expression of IKZF2 in peripheral blood to evaluate the prognosis of the disease, and our study provided some evidence and new ideas about new sights for prediction and evaluation of LN. Our bioinformatics analysis identified the gene IKZF2 may be critical in the prediction and evaluation of disease activity. We hope this study may provide some potential evidence and ideas for the future treatment and monitoring of LN from new insights. www.nature.com/scientificreports/